Search CORE

166 research outputs found

Stroke-based sketched symbol reconstruction and segmentation

Author: Kaiyrbekov Kurmanbek
Sezgin Metin
Publication venue
Publication date: 19/01/2019
Field of study

Hand-drawn objects usually consist of multiple semantically meaningful parts. For example, a stick figure consists of a head, a torso, and pairs of legs and arms. Efficient and accurate identification of these subparts promises to significantly improve algorithms for stylization, deformation, morphing and animation of 2D drawings. In this paper, we propose a neural network model that segments symbols into stroke-level components. Our segmentation framework has two main elements: a fixed feature extractor and a Multilayer Perceptron (MLP) network that identifies a component based on the feature. As the feature extractor we utilize an encoder of a stroke-rnn, which is our newly proposed generative Variational Auto-Encoder (VAE) model that reconstructs symbols on a stroke by stroke basis. Experiments show that a single encoder could be reused for segmenting multiple categories of sketched symbols with negligible effects on segmentation accuracies. Our segmentation scores surpass existing methodologies on an available small state of the art dataset. Moreover, extensive evaluations on our newly annotated big dataset demonstrate that our framework obtains significantly better accuracies as compared to baseline models. We release the dataset to the community

arXiv.org e-Print Archive

Multimodal Group Activity Dataset for Classroom Engagement Level Prediction

Author: Sabuncuoglu Alpay
Sezgin T. Metin
Publication venue
Publication date: 18/04/2023
Field of study

We collected a new dataset that includes approximately eight hours of audiovisual recordings of a group of students and their self-evaluation scores for classroom engagement. The dataset and data analysis scripts are available on our open-source repository. We developed baseline face-based and group-activity-based image and video recognition models. Our image models yield 45-85% test accuracy with face-area inputs on person-based classification task. Our video models achieved up to 71% test accuracy on group-level prediction using group activity video inputs. In this technical report, we shared the details of our end-to-end human-centered engagement analysis pipeline from data collection to model development

arXiv.org e-Print Archive

Feature Point Detection and Curve Approximation for Early Processing of Freehand Sketches

Author: Sezgin Tevfik Metin
Publication venue
Publication date: 01/01/2001
Field of study

Freehand sketching is both a natural and crucial part of design, yet is unsupported by current design automation software. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer to produce a design environment that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in the sketch into the intended geometric objects using feature point detection and approximation. We demonstrate how multiple sources of information can be combined for feature detection in strokes and apply this technique using two approaches to signal processing, one using simple average based thresholding and a second using scale space

DSpace@MIT

Towards Building Child-Centered Machine Learning Pipelines: Use Cases from K-12 and Higher-Education

Author: Besevli Ceylan
Sabuncuoglu Alpay
Sezgin T. Metin
Publication venue
Publication date: 19/04/2023
Field of study

Researchers and policy-makers have started creating frameworks and guidelines for building machine-learning (ML) pipelines with a human-centered lens. Machine Learning pipelines stand for all the necessary steps to develop ML systems (e.g., developing a predictive keyboard). On the other hand, a child-centered focus in developing ML systems has been recently gaining interest as children are becoming users of these products. These efforts dominantly focus on children's interaction with ML-based systems. However, from our experience, ML pipelines are yet to be adapted using a child-centered lens. In this paper, we list the questions we ask ourselves in adapting human-centered ML pipelines to child-centered ones. We also summarize two case studies of building end-to-end ML pipelines for children's products

arXiv.org e-Print Archive

Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification

Author: Sezgin Tevfik Metin
Soykan Gürkan
Yuret Deniz
Publication venue
Publication date: 17/08/2023
Field of study

Character re-identification, recognizing characters consistently across different panels in comics, presents significant challenges due to limited annotated data and complex variations in character appearances. To tackle this issue, we introduce a robust semi-supervised framework that combines metric learning with a novel 'Identity-Aware' self-supervision method by contrastive learning of face and body pairs of characters. Our approach involves processing both facial and bodily features within a unified network architecture, facilitating the extraction of identity-aligned character embeddings that capture individual identities while preserving the effectiveness of face and body features. This integrated character representation enhances feature extraction and improves character re-identification compared to re-identification by face or body independently, offering a parameter-efficient solution. By extensively validating our method using in-series and inter-series evaluation metrics, we demonstrate its effectiveness in consistently re-identifying comic characters. Compared to existing methods, our approach not only addresses the challenge of character re-identification but also serves as a foundation for downstream tasks since it can produce character embeddings without restrictions of face and body availability, enriching the comprehension of comic books. In our experiments, we leverage two newly curated datasets: the 'Comic Character Instances Dataset', comprising over a million character instances and the 'Comic Sequence Identity Dataset', containing annotations of identities within more than 3000 sets of four consecutive comic panels that we collected.Comment: 18 pages, 9 Figure

arXiv.org e-Print Archive

Approximate solutions for nonlinear oscillation of a mass attached to a stretched elastic wire

Author: Altay Demirbağ Sezgin
Durmaz Seher
Kaya Metin Orhan
Publication venue: Elsevier Ltd.
Publication date: 28/02/2011
Field of study

AbstractIn this paper, the approximate solutions of the mathematical model of a mass attached to a stretched elastic wire are presented. At the beginning of the study, the equation of motion is derived in a detailed way. He’s max–min approach, He’s frequency–amplitude method and the parameter-expansion method are implemented to solve the established model. The numerical results are further compared with the approximate analytical solutions for both a small and large amplitude of oscillations, and a very good agreement is observed. The relative errors are computed to illustrate the strength of agreement between the numerical and approximate analytical results

Elsevier - Publisher Connector

Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings

Author: Sezgin Tevfik Metin
Topal Barış Batuhan
Yuret Deniz
Publication venue
Publication date: 19/11/2022
Field of study

Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort.Comment: Preprint, 8 pages of the paper itself + 7 pages of Supplementary Material. Includes 8 figures and 7 table

arXiv.org e-Print Archive

Sketch interpretation using multiscale stochastic models of temporal patterns

Author: Sezgin Tevfik Metin, 1978-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 102-114).Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality. But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively, one stroke at a time and their processing should take advantage of this. This thesis explores ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching.(cont.) We present a sketch recognition framework based on hierarchical statistical models of temporal patterns. We show that in certain domains, stroke orderings used in the course of drawing individual objects contain temporal patterns that can aid recognition. We build on this work to show how sketch recognition systems can use knowledge of both common stroke orderings and common object orderings. We describe a statistical framework based on Dynamic Bayesian Networks that can learn temporal models of object-level and stroke-level patterns for recognition. Our framework supports multi-object strokes, multi-stroke objects, and allows interspersed drawing of objects - relaxing the assumption that objects are drawn one at a time. Our system also supports real-valued feature representations using a numerically stable recognition algorithm. We present recognition results for hand-drawn electronic circuit diagrams. The results show that modeling temporal patterns at multiple scales provides a significant increase in correct recognition rates, with no added computational penalties.by Tevfik Metin Sezgin.Ph.D

DSpace@MIT